Database of Mandarin Neighborhood Statistics
نویسندگان
چکیده
In the design of controlled experiments with language stimuli, researchers from psycholinguistic, neurolinguistic, and related fields, require language resources that isolate variables known to affect language processing. This article describes a freely available database that provides word level statistics for words and nonwords of Mandarin, Chinese. The featured lexical statistics include subtitle corpus frequency, phonological neighborhood density, neighborhood frequency, and homophone density. The accompanying word descriptors include pinyin, ascii phonetic transcription (sampa), lexical tone, syllable structure, dominant PoS, and syllable, segment and pinyin lengths for each phonological word. It is designed for researchers particularly concerned with language processing of isolated words and made to accommodate multiple existing hypotheses concerning the structure of the Mandarin syllable. The database is divided into multiple files according to the desired search criteria: 1) the syllable segmentation schema used to calculate density measures, and 2) whether the search is for words or nonwords. The database is open to the research community at https://github.com/karlneergaard/Mandarin-Neighborhood-Statistics.
منابع مشابه
The Effects of Phonological Neighborhoods on Spoken Word Recognition in Mandarin Chinese
Title of Document: THE EFFECTS OF PHONOLOGICAL NEIGHBORHOODS ON SPOKEN WORD RECOGNITION IN MANDARIN CHINESE Pei-Tzu Tsai, Master of Arts, 2007 Directed By: Professor Nan Bernstein Ratner Department of Hearing and Speech Sciences Associate Professor Rochelle Newman Department of Hearing and Speech Sciences Spoken word recognition is influenced by words similar to the target word with one phoneme...
متن کاملThe design and application of a speech database for Chinese TTS system
The design and application of a speech database for Mandarin TTS system is presented in this paper. To build a scientific, versatile speech database to meet the call for improving the quality of synthesis units and enhancing previous prosodic models, is the main point of the research. The database structure and contents and the methodology for creating similar database are described, and also s...
متن کاملA Review of Statistics and Probability Journals in ISI Database
As in recent years the scientific productivity about ISI database and other related database have been increased, it is eligible for researchers of Statistics in Iran to know more about these journals and their statues in ISI database. In this study with the use of bibliometric methods, we have reviewed the status of Statistics and Probability . From all nations around the world, these are only...
متن کاملA Prosodic Labeling System for Mandarin Speech Database
A working database needs tools to transcribe and label at both phonetic and prosodic levels. While the proposed phonetic transcription system is a simplified from of the International Phonetic Alphabet (IPA) following the SAMPA guidelines; the prosodic labeling system is an elaborated form of the ToBI (Tone and Break Indices) framework adopted for Mandarin. In particular, the proposed prosodic ...
متن کاملMulti-accented Mandarin Database Construction and Benchmark Evaluations
In this paper, we describe the designing, recording and checking procedures of a multi-accented Mandarin speech database, and present benchmark evaluation of this database. The database was recorded in 6 cities in China, containing 1200 speakers’ accented Mandarin speech of continuous digits, isolated words and sentences. In total, 520k utterances (572.5 hours) were collected. We perfrom the in...
متن کامل